Backfitting for large scale crossed random effects regressions

نویسندگان

چکیده

Regression models with crossed random effect errors can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling easily grow as N3/2 (or worse) for N observations. Papaspiliopoulos, Roberts Zanella (Biometrika 107 (2020) 25–40) present a collapsed sampler that costs O(N), but under an extremely stringent model. We propose backfitting algorithm compute estimate prove it O(N). A critical part the proof is in ensuring number iterations required O(1), which follows from keeping certain matrix norm below 1?? some ?>0. Our conditions are greatly relaxed compared those sampler, though still strict. Empirically, has less strict than our assumptions. illustrate new on ratings data set Stitch Fix.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Laws of Large Numbers for Random Linear

The computational solution of large scale linear programming problems contains various difficulties. One of the difficulties is to ensure numerical stability. There is another difficulty of a different nature, namely the original data, contains errors as well. In this paper, we show that the effect of the random errors in the original data has a diminishing tendency for the optimal value as the...

متن کامل

Efficient moment calculations for variance components in large unbalanced crossed random effects models

Large crossed data sets, described by generalized linear mixed models, have become increasingly common and provide challenges for statistical analysis. At very large sizes it becomes desirable to have the computational costs of estimation, inference and prediction (both space and time) grow at most linearly with sample size. Both traditional maximum likelihood estimation and numerous Markov cha...

متن کامل

"Influence sketching": Finding influential samples in large-scale regressions

There is an especially strong need in modern largescale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the “needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook’s distance, a classical s...

متن کامل

Random Access Support for Large Scale VoD

The implementation of an interactive Video on Demand service is conventionally expensive because each viewer must be allocated a video stream. On the other hand, video streams can be multicast to a number viewers, reducing the system resources. This paper proposes a scheme, called Random Access PMC (RAPMC) which reduces the requirements for supporting an interactive VOD service. It is shown tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Annals of Statistics

سال: 2022

ISSN: ['0090-5364', '2168-8966']

DOI: https://doi.org/10.1214/21-aos2121